Greedy mixture learning for multiple motif discovery in biological sequences

نویسندگان

  • Konstantinos Blekas
  • Dimitrios I. Fotiadis
  • Aristidis Likas
چکیده

MOTIVATION This paper studies the problem of discovering subsequences, known as motifs, that are common to a given collection of related biosequences, by proposing a greedy algorithm for learning a mixture of motifs model through likelihood maximization. The approach adds sequentially a new motif to a mixture model by performing a combined scheme of global and local search for appropriately initializing its parameters. In addition, a hierarchical partitioning scheme based on kd-trees is presented for partitioning the input dataset in order to speed-up the global searching procedure. The proposed method compares favorably over the well-known MEME approach and treats successfully several drawbacks of MEME. RESULTS Experimental results indicate that the algorithm is advantageous in identifying larger groups of motifs characteristic of biological families with significant conservation. In addition, it offers better diagnostic capabilities by building more powerful statistical motif-models with improved classification accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A sequential method for discovering probabilistic motifs in proteins.

OBJECTIVES This paper proposes a greedy algorithm for learning a mixture of motifs model through likelihood maximization, in order to discover common substrings, known as motifs, from a given collection of related biosequences. METHODS The approach sequentially adds a new motif component to a mixture model by performing a combined scheme of global and local search for appropriately initializi...

متن کامل

Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences

This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...

متن کامل

dMotifGreedy: a novel tool for de novo discovery of DNA motifs with enhanced power of reporting distinct motifs

De novo discovery of over-represented DNA motifs is one of the major challenges in computational biology. Although numerous tools have been available for de novo motif discovery, many of these tools are subject to local optima phenomena, which may hinder detection of multiple distinct motifs. A greedy algorithm based tool named dMotifGreedy was developed. dMotifGreedy begins by searching for ca...

متن کامل

G-SteX: Greedy Stem Extension for Free-Length Constrained Motif Discovery

Most available motif discovery algorithms in real-valued time series find approximately recurring patterns of a known length without any prior information about their locations or shapes. In this paper, a new motif discovery algorithm is proposed that has the advantage of requiring no upper limit on the motif length. The proposed algorithm can discover multiple motifs of multiple lengths at onc...

متن کامل

Relation between weight matrix and substitution matrix: motif search by similarity

MOTIVATION The discovery of patterns shared by several sequences that differ greatly is a basic task in sequence analysis, and still a challenge. Several methods have been developed for detecting patterns. Methods commonly used for motif search include the Gibbs sampler, Expectation-Maximization (EM) algorithm and some intuitive greedy approaches. One cannot guarantee the optimality of the resu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 19 5  شماره 

صفحات  -

تاریخ انتشار 2003